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Abstract 

Background: A large number of studies have been carried out to obtain amino acid propensities for a-helices and 
(3-sheets. The obtained propensities for a-helices are consistent with each other, and the pair-wise correlation 
coefficient is frequently high. On the other hand, the (3-sheet propensities obtained by several studies differed 
significantly, indicating that the context significantly affects (3-sheet propensity. 

Results: We calculated amino acid propensities for a-helices and (3-sheets for 39 and 24 protein folds, respectively, 
and addressed whether they correlate with the fold. The propensities were also calculated for exposed and buried 
sites, respectively. Results showed that a-helix propensities do not differ significantly by fold, but (3-sheet 
propensities are diverse and depend on the fold. The propensities calculated for exposed sites and buried sites are 
similar for a-helix, but such is not the case for the (3-sheet propensities. We also found some fold dependence on 
amino acid frequency in (3-strands. Folds with a high Ser, Thr and Asn content at exposed sites in (3-strands tend to 
have a low Leu, He, Glu, Lys and Arg content (correlation coefficient = -0.90) and to have flat (3-sheets. At buried 
sites in (3-strands, the content of Tyr, Trp, Gin and Ser correlates negatively with the content of Val, He and Leu 
(correlation coefficient = -0.93). "AII-(3" proteins tend to have a higher content of Tyr, Trp, Gin and Ser, whereas 
"a/(3" proteins tend to have a higher content of Val, He and Leu. 

Conclusions: The a-helix propensities are similar for all folds and for exposed and buried residues. However, 
(3-sheet propensities calculated for exposed residues differ from those for buried residues, indicating that the 
exposed-residue fraction is one of the major factors governing amino acid composition in (3-strands. Furthermore, 
the correlations we detected suggest that amino acid composition is related to folding properties such as the twist 
of a (3-strand or association between two (3 sheets. 



Background 

In 1974, Chou and Fasman published the calculated fre- 
quency of occurrence and conformational propensity of 
each amino acid in the secondary structures of 15 pro- 
teins, consisting of 2473 amino acid residues [1]. Since 
then, a vast number of protein structures have been 
determined and classified to reflect both structural 
and evolutionary relatedness [2,3]. SCOP classification 
(Structural Classification of Protein) is one of the 
major database which provides a detailed and compre- 
hensive description of the relationships of all known 
proteins structures. The classification is on hierarchical 
levels: the first two levels, family and superfamily, de- 
scribe near and far evolutionary relationships; the 
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third, fold, describes geometrical relationships. Most of 
the folds (899/1086) are assigned to one of the four 
structural classes; "all-a", "all-p", "a/ (3" (for proteins with 
a-helices and p-strands that are largely interspersed) and 
"a + (3" (for those in which a-helices and p-strands are 
largely segregated). Remaining folds are assigned to 
"Multi-domain", "Membrane and cell surface" or "Small" 
proteins classes. In 2009, we developed a quaternary 
structural database for proteins, OLIGAMI [4] in which 
the oligomer information was added to the SCOP classi- 
fication [2], to allow an exhaustive survey of tertiary or 
quaternary structures of proteins. 

A large number of studies have been carried out to ob- 
tain amino acid propensities for a-helix and p-sheet 
[1,5-28]. The propensities have been estimated from 
statistical analysis of three-dimensional structures [1,6- 
15], experimental determination of a-helix or p-sheet 
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content in peptides [16-23], and experimental deter- 
mination of the thermodynamic stability of mutant 
proteins [23-28]. The obtained propensities for a-helix 
are consistent between studies, with the pair-wise cor- 
relation coefficient (R) frequently being >0.8, although 
Richardson et al [7] and Engel et al [12] showed that 
amino acid propensities are different for specific loca- 
tions of a-helix depending on amino acids. Engel et al. 
also show that most helices are amphiphilic and have 
a strong tendency to both begin and end on the 
solvent-inaccessible face of the a-helix, suggesting that 
the propensities for a-helix differ between solvent- 
accessible and solvent-inaccessible faces. On the other 
hand, the (3-sheet propensities obtained by several 
studies differ significantly, indicating that the context 
significantly affects p-sheet propensity, p-sheets consist 
of various combination of p-strands; the number of 
strands, parallel, anti-parallel, mixed p-sheet and so on. 
For IgG-binding domain from protein G, which have 
four antiparallel p-strands, Minor and Kim showed 
that p-sheet propensity measured at the center strand 
[27] differs significantly from that measured at an edge 
strand [28]. This context-dependent nature of the p- 
sheet propensity may be reflected in its dependence on 
overall protein fold. Previously, Jiang et al. [10] and 
Costantini et al. [13] calculated the secondary structure 
propensities for four protein structural classes; "all-a", 
"all-p", "a/p", and "a+p" and showed that p-sheet pro- 
pensity depends on these structural classes. However, 
it has not been clarified that their dependencies result 
from the difference in what kind of context, since each 
folding class contains various folds that have different 
context. So it is interesting to address whether the 
amino acid propensity of each amino acid vary de- 
pending on the fold type. 

In this study, to clarify the relationship between the 
amino acid propensity and the context in more detail, 
we calculated the occurrence of each amino acid resi- 
due in a-helical and p-strand conformations as a func- 
tion of the SCOP fold of the protein (i.e. lower 
structural level than previously addressed), and categor- 
ized the residues as exposed to solvent or buried inter- 
ior. The results indicate that a-helix propensities do 
not differ significantly by fold but that p-sheet propen- 
sities are diverse and indeed depend on the fold. Fur- 
thermore, we found the some relationships between a 
structural feature and an amino acid composition by 
analyzing correlations between a protein fold and an 
amino acid propensity. 

Methods 

Selecting protein structures to be included in the dataset 

This study uses sets of non-redundant PDB entries 
(three-dimensional coordinates) in each fold type. To 



facilitate the analysis, we wanted to extract monomeric 
or homo-oligomeric and single-domain proteins from 
PDB. This has been accomplished in OLIGAMI (http:// 
protein.tsoka.ac.jp/oligami/) [4] which is database 
combined SCOP database (Structural Classification of 
Proteins) [2] and oligomeric information. From these 
coordinates, a non-redundant subset of PDB entries 
(in which no pair of structures had >60% sequence iden- 
tity) was created for each fold of the four main SCOP 
classes of proteins: "all-a", "all-P", "a/p", and "a + p". The 
number of proteins (or protein domains) classified 
in each SCOP fold varies; for example, the SCOP fold 
"dipeptide transport proteins" contains only one entry, 
that of D-amino peptidase (PDB, 1HI9). This enzyme is 
a decamer of identical subunits, each with 88 and 68 
residues in a-helical and p-strand conformations, re- 
spectively. Because the number of residues in this SCOP 
fold category is too small to extract statistically mean- 
ingful results, we selected only those SCOP folds 
that contained at least 2,000 residues in an a-helical, 
p-strand, or other conformation (Table 1). Consequently, 
we identified 39 (2,029 PDB entries) of 899 SCOP folds 
for the dataset of a-helices and 24 (1,879 PDB entries) 
of 899 SCOP folds for the dataset of p-strands. Twelve 
of these SCOP folds, such as the TIM barrel and 
Rossmann fold— both examples of a/p proteins— were 
included in the dataset for both a-helices and p-strands, 
and consequently we used 51 SCOP folds. We also 
identified 39 of 51folds for the dataset of other con- 
formation as a control. SCOP release 1.73 was used for 
all calculations. 

Determining amino acid propensities in the secondary 
structure elements 

The propensity, Py, of amino acid, z, for SCOP fold, 
in a-helices (Pfj) or p-strands (Pfj) was calculated as 
follows: 

J f (1) 

where jfj is the frequency of the amino acid z occur- 
ring in SCOP fold / in the secondary structure S (/f = 
Nfj/Nj), and f t is the frequency of the amino acid z 
occurring in the protein {fi = Ni/N t ). Nfj and A/f are 
the number of amino acid z, and the number of all 
amino acids in the secondary structure S in SCOP 
fold N t is the number of amino acid z, and N t is 
the total number of amino acids in all 51 SCOP folds. 
Therefore, the propensity means a relative quantity of 
the frequency of the amino acid z occurring in a sec- 
ondary structure in a specific fold divided by the fre- 
quency of the amino acid z occurring in all proteins. 
If Pf = 1, the amino acid z is contained equally in both 
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Table 1 SCOP folds included in the dataset of a-helices, p-strands and other conformation 

SCOP Class SCOP Fold NEj 1 N } 2 a-helix (S-strand Other 



N f 3 (f f exp f N f 4 (ff exp ) 7 N f 5 (f ° exp f 



All a proteins 



4-helical cytokines 


24 


3,452 


2,272 


(50) 










Alpha/alpha toroid 


25 


10,766 


5,410 


(32) 






4,611 


(56) 


Alpha-alpha superhelix 


48 


11,524 


8,213 


(48) 






3,270 


(76) 


Cytochrome P450 


19 


7,811 


4,042 


(43) 






3,008 


(63) 


EF Hand-like 


56 


7,106 


4,244 


(54) 






2,739 


(76) 


Ferritin-like 


40 


7,261 


5,354 


(35) 










Four-helical up-and-down bundle 


45 


5,800 


4,344 


(55) 










Globin-like 


30 


4,223 


3,161 


(56) 










HD-domain/PDEase-like 


12 


3,487 


2,248 


(43) 










Heme oxygenase-like 


15 


3,364 


2,369 


(43) 










L-aspartase-like 


14 


6,699 


4,154 


(32) 






2,217 


(45) 


Nuclear receptor ligand-binding domain 


20 


4,747 


3,221 


(48) 










Adenine nucleotide alpha hydrolase-like 


33 


7,219 


3,398 


(49) 






2,716 


(60) 


ALDH-like 


11 


5,122 


2,198 


(43) 










Alpha/beta-Hydrolases 


74 


23,468 


9,192 


(42) 


3,985 


(23) 


10,291 


(54) 


ClpP/crotonase 


19 


4,736 


2,369 


(38) 










Flavodoxin-like 


96 


17,354 


7,063 


(49) 


3,218 


(24) 


7,073 


(58) 


HAD-like 


47 


1 0,900 


4,853 


V- 1 '/ 






4,231 


(59) 


Knritratp/Knnrnnvlmalatp HphvHrnnpna^p-likp 


17 


6,267 


2,869 


(45) 






2,239 


(56) 


N AD(P)-binding Rossmann-fold domains 


100 


26,500 


12,1 16 


(40) 


4,312 


(18) 


1 0,072 


(57) 


Nucleotide-diphospho-sugar transferases 


24 


6,359 


2,297 


(47) 






2,723 


(61) 


Periplasmic binding protein-like II 


46 


14,225 


5,593 


(49) 


2,916 


(29) 


5,716 


(59) 


Phosphorylase/hydrolase-like 


36 


9,424 


3,277 


(43) 


2,105 


(22) 


4,042 


(57) 


P-loop containing nucleoside 
triphosphate hydrolases 


170 


39,025 


16,859 


(52) 


7,524 


(27) 


14,882 


(64) 


PLP-dependent transferases 


77 


30,112 


13,025 


(43) 


4,509 


(16) 


12,578 


(43) 


Restriction endonuclease-like 


30 


6,407 


2,631 


(50) 






2,447 


(64) 


Ribokinase-like 


24 


7,234 


3,029 


(42) 






2,589 


(59) 


Ribonuclease H-like motif 


33 


6,808 


2,832 


(49) 






2,585 


(70) 


S-adenosyl-L-methionine-dependent 
methyltransferases 


92 


23,315 


8,567 


(49) 


5,538 


(31) 


9,210 


(62) 


SIS domain 


14 


4,150 


2,171 


(38) 










Thioredoxin fold 


82 


11,085 


3,845 


(59) 


2,400 


(29) 


4,840 


(67) 


TIM beta/alpha-barrel 


261 


81,525 


35,904 


(46) 


12,411 


(11) 


33,210 


(53) 


liypLUpildll SyilLlldSc Ucld bUUUMIl IIKxr 

PLP-dependent enzymes 


1 R 
I o 


0,ZyO 


Z,OZ I 








Z,jUj 




UDP-Glycosyltransferase/glycogen 
phosphorylase 


16 


6,976 


3,324 


(44) 






2,562 


(56) 


Acyl-CoA N-acyltransferases (Nat) 


63 


11,104 


3,872 


(57) 


3,376 


(33) 


3,856 


(74) 


Cysteine proteinases 


34 


9,122 


3,027 


(44) 






4,248 


(64) 


Ferredoxin-like 


174 


19,761 


6,048 


(57) 


5,212 


(36) 


8,501 


(73) 


Protein kinase-like (PK-like) 


57 


16,585 


6,566 


(44) 


2,783 


(40) 


7,236 


(65) 


Thioesterase/thiol ester dehydrase-isomerase 


38 


5,319 






2,090 


(43) 






Zincin-like 


33 


9,786 


4,555 


(44) 






4,099 


(63) 
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Table 1 SCOP folds included in the dataset of a-helices, p-strands and other conformation (Continued) 



All |3 proteins 



6-bladed beta-propeller 


1 Q 
I O 


o, i y j 


Z,/OD 


{Z5) 


5,1)05 


(bO) 


Concanavalin A-like lectins/glucanases 


ou 


1 3 ion 


D,z4z 


{51) 


O, 1 ZZ 


{O/) 


|— N. II , 1 1 | , | | • 

Double-stranded beta-helix 


68 


15,315 


5,425 


(27) 


7,075 


(62) 


Galactose-binding domain-like 


29 


4,522 


2,191 


(39) 


2,077 


(71) 


Immunoglobulin-like beta-sandwich 


117 


13,954 


6,176 


(47) 


7,225 


(78) 


Lipocalins 


41 


6,422 


3,120 


(45) 


2,371 


(75) 


OB-fold 


58 


7,072 


2,665 


(41) 


3,418 


(75) 


PH domain-like barrel 


54 


7,041 


2,434 


(42) 


3,393 


(83) 


Single-stranded right-handed beta-helix 


27 


8,817 


3,600 


(31) 


4,484 


(59) 


Trypsin-like serine proteases 


41 


9,275 


3,290 


(29) 


4,827 


(63) 



1. NEj : The number of PDB entries in the fold, j. 

2. Nj : The number of residues in the fold, j. 

3. Nf : The number of residues of a-helices in the fold, j. 

4. Nf : The number of residues of p-strands in the fold, j. 

5. Nf : The number of residues of other conformation in the fold, j. 

6. f f exp : Fraction (%) of exposed residues in a-helices in the fold, j. 

7. ff exp : Fraction (%) of exposed residues in p-strands in the fold, j. 

8. f f exp : Fraction (%) of exposed residues in other conformation in the fold, j. 

the a-helical region and the protein. When Pf>l, the 
amino acid i is more frequent in the a-helical region 
than in the protein. The standard deviation for the 
normalized function, P ih was calculated as follows. 



m-fij) m 



(2) 



The secondary structure assignment program DSSP 
[29] was used for all secondary structure assignments. 
DSSP program assigns secondary structures, i.e., H: a- 
helix, G: 3 10 -helix, I: 5-helix (n-helix), E: extended 
strand, B: residue in isolated (3-bridge, S: bend and T, 
hydrogen bonded turn. We regarded H: a-helix and G: 
3 10 -helix as a-helix and E as p-strand, and remaining 
residues except T are defined as other conformation. 

Defining exposed and buried residues in the secondary 
structure elements 

Amino acid residues were defined as "exposed" when 
>20% of the total accessible surface area was exposed 
to solvent. This threshold level of 20% was determined 
as the value that could classify an almost equal number 
of residues as exposed (1,241 residues) or buried (1,276 
residues) in (3-strands for 37 soluble p-barrel proteins. 
The total accessible surface area for a given amino acid, 
X, was calculated using the tri-peptide (G-X-G), using 
DSSP [29]. The frequency of exposed, / f^, and bur- 
ied, / s ^ ur , residues was calculated for each amino acid 
in an a-helical or p-strand conformation for each 
SCOP fold. The propensities for an a-helical or p- 
strand conformation for each SCOP fold for exposed 
residues, Pff xp , and buried residues, Pfj bur , were 
obtained by dividing / f xp and / f ur by the frequency 



of the exposed and buried residues in all SCOP folds, 
fT P and/f Mr , respectively, 



pSexp 

ij 



pSbur 

ij 



f- 
Jii 



■S exp 



'Sbur 



jij 



f! 



'bur 



(3) 



(4) 



Population difference test 

The Fisher-Irwin population test can be used to deter- 
mine statistically significant differences between 
values for different fold types. Because the n value (the 
sum of the number of each amino acid, z, from both 
populations) was large, the exact Fisher-Irwin test values 
were not calculated. Instead, a large sample number ap- 
proximation was used [30]. 



fn — fa 



(5) 



The Pij value difference between the populations is 
considered significant if the test variable Z is >1.25, 
which corresponds to a 90% confidence level, then the 
populations were considered to be different. 

Results and discussion 

Amino acid propensities for the a-helical or p-strand 
conformation 

For individual amino acids, a P a of <0.9 denotes an a- 
helix breaker, a of >1.1 denotes an a-helix-favored 
amino acid, and values between 0.9 and 1.1 denote that 
the amino acid is neutral in this regard [31]. The same 
principle applies to P^. The amino acid propensities 
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calculated using our dataset (Pf and Pf ) are shown in 
Table 2. Their standard deviations ranged from 0.001 to 
0.004. The results are in good agreement with previous 
reports [1,6,10]. 

We also calculated the amino acid propensities for 
exposed and buried residues (Pf° p and P^ ur ) in the sec- 
ondary structural elements (Table 2). For a-helices, the 
three mean propensities Pf, Pf exp and pf bur have similar 
trends. On the other hand, mean propensities for 
exposed residues (pf exp ) and buried residues (pf bur ) for 
p-strands differ significantly (Table 2). It is especially 
interesting that Lys and Arg, but not two other charged 
residues, Asp and Glu, are preferred as exposed residues 
in p-strands. Not surprisingly, all charged amino acids 
are disfavored as buried residues in p-strands. The bur- 
ied regions disfavor charged amino acids for p-strands, 
whereas the a-helix can tolerate charged amino acids. 

As previously reported in statistical studies, charged 
amino acids (including Lys and Arg) yield low values for 
[1,6,10,13], which is in agreement with the mean pro- 
pensities, Pf, determined in the present work. Our 
results, however, show that Lys and Arg have relatively 
high P^ exp values for exposed residues, but this property 
is masked when comparing mean propensities. In our 
dataset, the fraction of exposed residues in p-strands is 



Table 2 Mean amino acid propensities for a-helix and 
P-strand conformations 



Amino acid 




a-helix 






(3-strand 






Exposed 
residues 


Buried Total 
residues residues 


Exposed Buried Total 
residues residues residues 


V 


0.83 


0.89 


0.91 


2.31 


1.57 


2.00 


1 


0.96 


1.01 


1.04 


2.02 


1.39 


1.79 


L 


1.16 


1.27 


1.28 


1.18 


0.93 


1.15 


M 


1.03 


1.29 


1.26 


1.01 


0.84 


1.01 


P 


0.48 


0.41 


0.44 


0.49 


0.42 


0.40 


A 


1.43 


1.37 


1.41 


0.48 


0.72 


0.75 


C 


0.63 


0.85 


0.85 


1.24 


1.07 


1.36 


F 


0.88 


0.99 


1.00 


1.50 


1.10 


1.4 


Y 


0.91 


0.98 


0.98 


1.71 


1.12 


1.37 


W 


0.87 


1.09 


1.07 


1.90 


0.91 


1.23 


Q 


1.34 


1.21 


1.26 


0.96 


0.82 


0.72 


S 


0.74 


0.80 


0.76 


0.86 


0.85 


0.81 


T 


0.72 


0.84 


0.78 


1.58 


1.08 


1.21 


N 


0.74 


0.77 


0.73 


0.71 


0.76 


0.63 


H 


0.90 


0.85 


0.87 


1.15 


0.98 


0.99 


D 


0.91 


0.73 


0.82 


0.61 


0.76 


0.55 


K 


1.25 


1.13 


1.17 


1.14 


0.98 


0.76 


E 


1.51 


1.25 


1.39 


0.89 


0.86 


0.65 


R 


1.31 


1.13 


1.21 


1.27 


0.82 


0.85 


G 


0.28 


0.59 


0.44 


0.41 


0.81 


0.67 



low (29%) compared to a-helices (46%). Most residues in 
p-strands are buried inside proteins and covered by a- 
helices or loop regions; exposed residues are thus less 
frequently encountered in p-strands, and their contribu- 
tions to the mean Pf are therefore small. Jiang and cow- 
orkers [10] have suggested that the hydrophobicities of 
amino acid side chains are the key determinant of p- 
sheet structures, but our data suggest that this result is 
true for buried residues but not for exposed residues in 
p-sheet structures. Minor and Kim [27] measured the 
propensity of the 20 amino acids for the p-sheet forma- 
tion in a variant of the IgG -binding domain from protein 
G, which have four antiparallel p-strands. Amino acid 
substitutions were made at a guest site on the solvent- 
exposed surface of the center strand. The propensities 
from those experiments show a strong correlation with 
the logarithmic Pf exp values obtained here (R = 0.82), al- 
though they show a weaker correlation with our loga- 
rithmic Pf bur values (R = 0.63). Furthermore, there is 
poor correlation between the propensities determined by 
Minor and Kim [27] and those of Chou and Fasman [1]. 
These results show that the preference for p-strands dif- 
fers for exposed and buried sites. 

Fold dependency of amino acid propensities for a-helices 

The propensities of amino acid i in the helical region of 
fold Pfp and the p-strand region of fold Pfp were thus 
calculated for 39 and 24 of SCOP folds, respectively 
(Figure 1). Their standard deviations range from 0.01 to 
0.05. With the exception of Met, Cys, Trp, Asn, Asp and 
His for Pfp and with the exception of Met, Pro and Cys 
for Pfp the population of amino acids differed (>90% 
confidence level) for more than one pair of folds. 

In particular, a wide range of Pfj values was obtained 
for the aromatic residues Phe (0.66-2.00) and Tyr 
(0.58-1.89), depending on fold type, and the mean pro- 
pensity for all folds is approximately 1.0 for these amino 
acids (Figure 1A and Table 2). The propensities of the 
charged residues Lys (0.65-1.56) and Arg (0.80-1.71) 
also varied widely depending on a fold. On the other 
hand, in >80% of SCOP folds, Leu or Glu are favored in 
the a-helical conformation, whereas Val, Pro, Ser, Thr, 
Asn, Asp and Gly are disfavored. Ala is favored in the a- 
helical conformation in the majority of the folds (79%) 
but is disfavored in two folds (Protein kinase-like and 4- 
helical cytokines). In particular, the value of the propen- 
sity of Ala for the "4-helical cytokines" fold is quite low 
(P* = 0.64). Met, Cys, Trp and His do not have a fold- 
type population difference at the >90% confidence level 
in any pair of folds, although their propensities vary 
widely among the various folds. Therefore, we did not 
further assess these amino acids. 

Richardson et al. showed that Ala is not favored in 
ends of a-helix [7], suggesting that a short a-helix does 
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Figure 1 Amino acid propensities for each SCOP fold. Box plots 
of amino acid propensities for each SCOP fold for a-helices (A) and 
(3-strands (B). Each box encloses 50% of the data with the median 
value displayed as a line. The top and bottom of the box mark the 
limits of ±25% of the data. The lines extending from the top and 
bottom of each box mark the minimum and maximum values 
within the data set that fall within an acceptable range. Any value 
outside of this range, called an outlier, is displayed as an individual 
point. Underlining of certain residues (one-letter code) on the 
horizontal axis denotes that the results from the Fisher-Irwin 
population proportion test indicated that differences in propensities 
are statistically significant between folds. 



not favor Ala. The mean length of a-helix of the 4 hel- 
ical cytokines fold is, however, the third longest of those 
of 39 folds (The longest and the second longest are 
those of "Ferritin-like" and "Four-helical up-and-down 
bundle" folds, respectively). Then, the correlation coeffi- 
cient between the mean length of a-helix and the amino 
acid propensity for each amino acid were calculated, so 
that they were smaller than 0.4. This result indicates that 
there is no relationship between the mean length of a- 
helix and the helical propensity of any amino acid. 



Engel et al. show that most helices are amphiphilic 
[7,12], suggesting that the propensities for a-helix de- 
pend on the exposed residue fraction. So, we examined 
the correlations between the exposed residue fraction 
and the frequency of amino acids in a-helices. No amino 
acid showed a strong correlation (R < -0.7 or R > 0.7) be- 
tween the exposed residue fraction and the amino acid 
frequency, although the charged residues, Lys and Asp 
have a relatively strong positive correlation (R K = 0.66, 
R D = 0.54). In contrast, the correlation coefficients of 
Glu and Arg (also charged amino acids) are small (R E = 
0.26, R R = 0.07). 

Figure 2 also presents propensities for exposed and 
buried amino acids for each SCOP fold. For the exposed 
regions of an a-helix (Figure 2A), less than ten amino 
acids show the population difference with 90% confi- 
dence for at least one pair of folds. Probably, this results 
from the fact that the dataset was limited to exposed 
residues. Glu (Pff xp : 1.0-1.92) is favored in exposed 
regions (Figure 2A) whereas Leu (Pf ur \ 0.97-1.88) is 
favored in buried regions (Figure 2B) for more than 80% 
of the folds. Pro and Gly are extremely disfavored in 
both exposed and buried regions for more than 92% of 
the folds. The propensities of Ala in the exposed and 
buried regions of a-helix have a similar tendency as Pfj. 
Ala is favored in the a-helical conformation in both 
exposed and buried regions for 72% and 79% of the 
folds, respectively, whereas Ala is disfavored by 8% and 
13% of the folds when exposed or buried, respectively. 
For the "4-helical cytokines" fold, the values of the pro- 
pensity of Ala in both exposed and buried regions are 
also low (Pf j exp = 0J2 and Pf Mr = 0.60). A wide range of 
Pfj bur values was obtained for the aromatic residues Phe 
and Tyr, depending on fold type (Figure 2B), like as Pfj. 

Fold dependency of amino acid propensities for p-strands 

As shown in Figure IB, a wide range of P^ values was 
obtained for Trp (0.45-2.22), Thr (0.73-1.87), Lys 
(0.46-1.45) and Arg (0.51-1.42) depending on fold type. 
For Lys, although P& was <0.9 in 18 of 24 folds (mean 
value of Pfj = 0.79), three folds (the lipocalins fold, OB- 
fold, and protein kinase-like fold) yielded P^j values > 1.2, 
which had the population differences corresponding to 
90% confidence level with that of other folds. These 
three folds are "all- (3" or "a + p", and all have largely 
exposed p-strands, whereas p-strands are usually cov- 
ered by a-helical or loop regions, especially in "a/p" pro- 
teins (Table 1). It has long been thought that p-strands 
prefer hydrophobic residues [1,6,10]; however, it now 
appears that largely exposed p-sheet structures prefer 
hydrophilic residues such as Lys. In contrast, the four 
amino acids Val, He, Phe and Tyr are favored (P?j> 1.1) 
in P-strands of more than 80% of folds, with Val (1.40- 
2.68) and lie (1.17-2.33) having particularly high 
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Figure 2 Amino acid propensities for exposed and buried residues. Box plots of Amino acid propensities for each SCOP fold for exposed (A) 
and buried (B) residues in a-helices and for exposed (C) and buried (D) residues in (3-strands. The propensities for (3-strands forTrp in the "PH 
domain-like barrel" SCOP fold and for Lys in the "Protein kinase-like" SCOP fold were out of range (4.3 in C and 3.8 in D, respectively) and are not 
shown. Underlining of certain residues on the horizontal axis denotes that the results from the Fisher-Irwin population proportion test indicated 
that differences in propensities are statistically significant between folds. 



propensities in this regard. The six amino acids Pro, Ala, 
Asn, Asp, Glu and Gly are disfavored (i^ <0.9) in p- 
strands for more than 80% of folds, and Pro (0.16-0.71) 
and Asp (0.22-0.91) have quite low propensities. 

The exposed residue fractions were observed in the 
range from about 10% to 46% for 24 folds (Table 1) and 
Glu and Lys have strong and positive correlations be- 
tween the amino acid propensities and the exposed resi- 
due fractions of p-strands in each fold (R E = 0.76, 
R K = 0.73). Gin, Arg and He also have relatively strong 
correlations, although the correlation for He is negative 
(R Q = 0.67, R r = 0.5, R^-0.68). As opposed to the 
strong positive correlation found for Glu, there is no 
correlation for the other negatively charged amino acid, 
Asp. The exposed residue fraction appears to be one of 
the major factors governing charged amino acid com- 
position of folds for p-strands. 

For residues exposed in a P-strand (Figure 2C), a wide 
range of Pff xp values was obtained for Ser (0.42-1.69), 
Lys (0.84-1.58) and Arg (0.68-1.85). A wide range of 
Pf Mr values was obtained for Cys (0.61-2.61), Phe 
(0.66-1.83), Tyr (0.64-1.92), Trp (0.31-1.77) and His 
(0.41-1.87) for residues buried in a p-strand (Figure 2D). 



lf xp values of Val, He, Phe, Tyr, Trp and Thr are high 
(P?f xp >l.l) for more than 75% of folds, indicating 
that these amino acids, which have a p-branched or aro- 
matic side chain, are favored in the exposed regions of 
p-strands in all fold types. In contrast, amino acids 
that are disfavored in all folds in p-strands are Pro 
(0.22-0.87), Ala (0.28-0.70) and Gly (0.23-0.88) for 
exposed regions, and Pro (0.12-0.87) for buried 
regions. It is interesting that pf? xp values for all folds 
for Ala are lower by comparison (Pff xp < 0.7), indicat- 
ing that an exposed residue on a p-strand is an 
extremely unfavorable position for Ala as well as for 
Pro and Gly. These strong tendencies support that the 
backbone solvation is a major factor determining 
thermodynamic p-propensities [32], 

Correlations between amino acid propensities and 
SCOP fold 

To investigate the factors that determine the fold de- 
pendence of the amino acid propensity for the secondary 
structures, correlation coefficients were calculated using 
amino acid propensities obtained from 39 SCOP folds 
for a-helices (Figure 3A) and 24 SCOP folds for p- 
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Figure 3 Correlation coefficients between amino acid propensities. Correlation coefficients between amino acid propensities for a-helices 

(A) and (3-strands (B). Strong negative correlations (R<-0.7) are indicated by dark blue, and positive correlations (R>0.7) are indicated by dark 

red. Comparatively strong negative correlations (R<-0.5) are indicated by light blue and positive correlations (R>0.5) by pink, 
k J 



strands (Figure 3B). Figure 4, for example, shows the 
relationships between the propensities of Glu and Lys 
for a-helices and p-strands. Each data point represents a 
fold in which more than 2,000 residues are found in 
each of a-helices and p-strands. For p-strands 
(Figure 4B), these two amino acid propensities have a 
correlation coefficient of 0.70, which suggests that folds 
rich in Glu are likely to also be rich in Lys. In contrast, 
for a-helices (Figure 4A) no significant correlation was 
observed. For p-strands, "a/p" proteins (□ in Figure 4B) 
show low propensities for Glu and Lys, although lipoca- 
lins and OB-folds (both "all-p", + in Figure 4B) show 
higher propensities for Glu and Lys. For "a+p" proteins 
(A in Figure 4B), there is no correlation between the 
propensities of Glu and Lys. The correlation coefficients 
for "all-p" proteins and "a/p" proteins are 0.83 and 0.86, 
respectively. 



Overall, there is a greater number of strong correla- 
tions (R < -0.7 or R > 0.7) for p-strands than for a-helices 
(Figure 3). For example, four strong positive correlations 
and five strong negative correlations are observed for p- 
strands, but there are only two paired strong correlations 
for a-helices (Ala and Gly, Tyr and Trp). Most of the 
positive correlations for p-strands involve paired amino 
acids having similar physicochemical characters (shown 
along the diagonal in Figure 3B), such as Val and He, Tyr 
and Trp, Ser and Gln/Thr/Asn, Asn and Thr, and Glu 
and Lys/Arg. In contrast, most of the negative correla- 
tions for p-strands involve pairs of amino acids having 
different physicochemical characters, such as Val and 
Tyr/Trp/Gln/Ser, He and Trp/Gln/Ser/Glu/Arg, Leu and 
Ser/Thr/Asn, Met and Asn, and Ala and Lys. 

Interestingly, the aromatic amino acid, Phe, shows 
low correlations with Trp and Tyr, for both a-helices 



Fujiwara et a I. BMC Structural Biology 2012, 12:18 
http://www.biomedcentral.eom/1 472-6807/1 2/1 8 



Page 9 of 15 



20 



15 



0.5 



A 

R=0.32 




o 

<3^!&..9....j 


o 


o 



2.0 



1.5 



0.5 



0.0 



0.0 



0.0 0.5 1.0 1.5 2.0 



T 1 1 

B 

A 


+ 

\/ R=0.70 


□ □ 


A 



0.5 



1.0 



1.5 



20 



Figure 4 Relationship between the amino acid propensities. 

Amino acid propensities, P, for Glu and Lys for each SCOP fold for 
ct-helices (A) and (3-strands (B). The SCOP classes are: all-a proteins 
(o), ct/p proteins (□), a + (3 proteins (A) and all-(3 proteins (+). 



and (3-strands, although strong positive correlations 
between Trp and Tyr are observed for both a-helices 
and (3-strands. 

Correlations between SCOP fold and propensities for 
exposed or buried amino acids 

We also calculated correlation coefficients for amino 
acid propensities of exposed and buried residues for 
a-helices (Figure 5), |3-strands (Figure 6) and other 
conformation (Data not shown). Although amino acid 
propensities for a-helices have two strong correlations 



(Figure 3A), there is no strong correlation for exposed 
(Figure 5A) and buried (Figure 5B) residues for a-heli- 
ces. The strong positive correlation between Trp and 
Tyr for all residues was absent for exposed residues, 
but a weak positive correlation was observed for buried 
residues. These results indicate that a fold that favors 
Trp on the interior side of an a-helix also favors Tyr 
in a interior of a-helices. Again, Phe had no correlation 
with Trp or Tyr for exposed or buried residues. The 
positive correlations among Ser, Asn and Thr, and the 
negative correlations between Ser/Thr and Glu, were 
observed only for exposed residues. Although some 
new correlations were observed, these values were rela- 
tively low for a-helices. For other conformation, strong 
correlation was not observed for both exposed and 
buried residues. 

Correlation for buried amino acids in p-strand 

In contrast, for |3-strands, most of the correlations 
shown in Figure 3B are strong correlations for exposed 
(Figure 6A) and buried (Figure 6B) residues. The strong 
negative correlations for Val/Ile and Tyr/Trp/Gln were 
observed for buried but not exposed residues. In other 
words, a fold type that prefers Val or He does not prefer 
Tyr, Trp or Gin, especially for buried residues. 

By visually inspecting buried residues for |3-strands 
in the SCOP fold group of "concanavalin A-like 
lectins/glucanases" (concanavalin A), in addition to bur- 
ied Tyr and Trp residues we found many polar amino 
acids such as Gin, Ser or Thr, and charged amino acids 
such as Glu, Lys or Arg, involved in H-bonds with each 
other to counterbalance the polarity in the hydrophobic 
environment. For the buried residues, we calculated the 
correlation coefficients between the combined frequen- 
cies of hydrophobic amino acids (Val, He and Leu) 
and some polar amino acids (Table 3 and Figure 7). 
The correlation coefficients calculated from the frequen- 
cies are the same as those calculated from the propen- 
sities, and thus it is easier to understand the amino acid 
occurrences. The combined frequencies of Trp, Tyr and 
Gin that are buried have a strong correlation (R = -0.87) 
with those of hydrophobic amino acids (Val, He and 
Leu). The inclusion of Ser in the group with Trp, Tyr 
and Gin increased the correlation coefficient to -0.93 
(Figure 7). The fact that the correlation coefficients 
for Val/Ile/Leu and Tyr/Trp/Gln/Ser range from -0.19 
to -0.75 indicates synergy in the correlation of the 
combined frequencies for |3-strands that does not 
exist for a-helices and other conformation (Table 3). The 
synergy between these amino acid groups suggests that 
the amino acids within the same group can be 
exchanged. For example, in a fold type where Leu is pre- 
ferred for buried residues, He will also be preferred. 
Thus, at buried sites, fold types with many aliphatic 
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Figure 5 Correlation coefficients between a-helix propensities for exposed residues and buried residues. Correlation coefficients between 
a-helix propensities for exposed residues (A) and buried residues (B). Strong negative correlations (R<-0.7) are indicated by dark blue, and 
positive correlations (R>0.7) are indicated by dark red. Comparatively strong negative correlations (R<-0.5) are indicated by light blue and 
positive correlations (R>0.5) by pink. 



residues (Val, He and Leu) also contain low quantities of 
Tyr, Trp, Gin and Ser. Figure 7 also shows that "all-p" 
proteins tend to have a higher content of Tyr, Trp, Gin 
and Ser, whereas "a/ p" proteins have a higher content of 
aliphatic amino acids at buried sites. The top six folds 
for the content of Tyr, Trp, Gin and Ser at buried sites in 
p-strands are "all-p" proteins and have two large p-sheets 
packed together (lipocalins, concanavalin A, 6-bladed 
beta-propeller (6-bb-propeller), galactose-binding domain- 
like (Gbd), double-stranded p-helix (DS p-helix), and 
immunoglobulin-like beta-sandwich folds (Ig)). Other 
"all-p" proteins that consisted of only one small p-sheet 
or small p-barrel structure have a small hydrophobic 
core. The H-bonds between the buried side chains may 
be necessary for correct alignment of two large p sheets 
in particular. 



Correlation for exposed amino acids in p-strand 

Negative correlations for Ile/Leu and Ser/Thr/Asn were 
observed in the exposed residues (Figure 6A), although 
the correlations for He and Thr/Asn were not observed 
when both exposed and buried residues were calculated 
together (Figure 3B). Negative correlations were also 
observed for Glu and Ser/Asn and for Arg and Thr. We 
examined the correlation of the combined frequencies 
for these exposed amino acids in p-strands as shown in 
Table 4. This result shows that strong correlations exist 
in the frequencies of certain hydrophobic amino acids 
(He, Leu), charged amino acids (Glu, Lys, Arg), and polar 
amino acids (Ser, Thr, Asn) in the exposed regions of p- 
strands. It is interesting that the frequencies of hydro- 
phobic (He, Leu) and charged (Glu, Lys, Arg) amino 
acids correlate negatively with those for polar amino 
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Figure 6 Correlation coefficients between (3-sheet propensities for exposed residues and buried residues. Correlation coefficients 
between (3-sheet propensities for exposed residues (A) and buried residues (B). Strong negative correlations (R<-0.7) are indicated by dark blue, 
and positive correlations (R>0.7) are indicated by dark red. Comparatively strong negative correlations (R<-0.5) are indicated by light blue and 
positive correlations (R>0.5) by pink. 

I. ) 



acids (Ser, Thr, Asn). A common feature for He, Leu, 
Glu, Lys and Arg is that they have relatively long side 
chains, including more than two hydrophobic methylene 
groups, whereas Ser, Thr and Asn have short side 
chains. 

Figure 8 shows a strong correlation between the com- 
bined groupings of Ser, Thr and Asn with He, Leu, Glu, 
Lys and Arg (R = -0.90). For the exposed regions of (3- 
strands, it is clear that in all "a/ (3" proteins and all "a+p" 



Table 3 Correlation coefficients for buried residues 





a-helix 


P-strand 


Other 


fwYQ VS - fvi 


-0.51 


-0.87 


-0.24 


fwYQ VS - fviL 


-0.22 


-0.87 


-0.26 


fwYQS VS - fviL 


-0.31 


-0.93 


-0.52 



proteins, He, Leu, Glu, Lys and Arg are preferred and 
that Ser, Thr and Asn are disfavored. Fold types that pre- 
fer Ser, Thr or Asn have a relatively low content of 
He, Leu, Glu, Lys, or Arg, and they are "all-p" proteins. 
Figure 8 also shows the widespread distribution of 
the folds of "all-p" proteins. For the two SCOP folds DS 
p-helix and OB-fold of "all-p" proteins, the residues He, 
Leu, Glu, Lys or Arg are preferred in the exposed 
regions of the p-strands. These fold types have twisted 
and bent p-strands. Some C a atoms in the p-strands are 
positioned at the bottom of the narrow and deep valley 
formed by the twisted and bent p-strands (Figure 9D 
and E). At such positions, the short, polar side chain of 
Ser, Thr or Asn is unable to reach the solvent, so amino 
acids with long side chains are favored. Much the same 
is true for "a/p" proteins (Figure 9F and G). The p-sheet 
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Figure 7 Relationship between the frequencies of buried 
residues. Relationship between the frequencies of buried Val, lie 
and Leu residues, f VILl and buried Trp, Tyr, Gin and Ser residues, 
fwYos, in (3-strands. The SCOP classes are: a/(3 proteins (□), a + 13 
proteins (A) and all-p proteins (+). 



is covered by a-helices and twists in "a/ (3" proteins, leav- 
ing only narrow spaces for the residues at the ends of 
the p-strands to reach solvent. In contrast, the two SCOP 
folds concanavalin A and single-stranded right-handed 
p-helix (SS p-helix) have a remarkably high content of 
Ser, Thr and Asn in the exposed regions of p-strands 
and have largely exposed and flat p-sheets (Figure 9A, B 
and C). Figure 9C shows that Ser, Asn and Thr are dom- 
inant in the flat p-sheet, and they do not significantly 
make contact with each other. These results suggest that 
amino acid composition in the exposed regions of p- 
strands governs the formation of a twist in p-sheets. 

Wang et al. [33] showed that isolated p-strands in mo- 
lecular dynamics simulations are not twisted, suggesting 
that the stabilization of the twist must be due to inter- 
strand interactions. Another computer simulation study 
found that inter-strand interactions by side chains in- 
duce a twist and that p-branched side chains are import- 
ant for twist formation [34]. On the other hand, Koh 
et al. [35] and Bosco et al. [36] used statistical analyses 
to show that p-sheet structure is mainly determined by 
the backbone, and the contribution of side chains is 

Table 4 Correlation coefficients for solvent-exposed 
residues 





a-helix 


(3-strand 


Other 


k VS. fsTTV 


-0.21 


-0.79 


-0.51 


feffi VS. fsTN 
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-0.61 
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Figure 8 Relationship between the frequencies of exposed 
residues. Relationship between the frequencies of exposed lie, Leu, 
Glu, Lys and Arg residues, f ILEKR , and exposed Ser, Thr and Asn 
residues, f STNl in (3-strands. The SCOP classes are: a/(3 proteins (□), 
a + (3 proteins (A) and all-(3 proteins (+). 



small. This indicates that twisting is an inherent prop- 
erty of a polypeptide chain, implying that a p-strand 
should twist regardless of its amino acid sequence. How- 
ever, some folds have a large/flat p-sheet, such as the 
SCOP groups concanavalin A and SS p-helix. Previous 
studies have targeted only the twisted p-strand and not 
focused on the flat p-sheet. Our results suggest that the 
amino acid composition in the exposed regions of p- 
strands may be related to the twist and bend of the 
strand, showing that side chain interactions are also 
an important factor for p-strand twisting. An intuitive 
explanation is that the long side chains of Leu, He, 
Lys, Arg and Glu in the exposed regions come close 
together to form the hydrophobic core, resulting in the 
formation of a twist and/or bend in p-strands. In con- 
trast, the side chains of Ser, Thr and Asn have low 
hydrophobicities and are short so that the hydrophobic 
interactions between the side chains are weak and pro- 
duce a flat p-sheet. Therefore, it seems that the strain 
within a P-sheet is one of the major factors governing 
amino acid propensities of folds for p-strands. 

The types of P-sheets and the amino acid propensity 

The folds can be classified by their p-sheet types into 
three; parallel, antiparallel and mixed p-sheet. For "all-p" 
protein class and "a + P" protein class, p-sheets of all 
folds used in this study are completely antiparallel p- 
sheet except for SS p-helix which has completely parallel 
p-sheet. The folds of "a/p" protein class have completely 
or mainly parallel p-sheets. p-sheets of the three folds, 
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Figure 9 Amino acid residues on (3-strands of three folds. Amino acid residues in (3-strands of concanavalin A (A, B and C, PDB ID:1 10 A), DS 
(3-helix (D and E, PDB ID:10DM), and TIM barrel (F and G, PDB ID:1SFS). The residues for a-helices are colored magenta, and those for (3-strands 
are colored yellow. The side chains of residues in (3-strands are colored by atom type (nitrogen: blue, oxygen: red, carbon: grey) in C 



"Flavodoxin-like", "NAD (P) -binding Rossmann-fold domains" 
and "TIM beta/alpha-barrel" are completely parallel, 
whereas "Periplasmic binding protein-like II" and "Thior- 
edoxin fold" have mixed (3-sheet. 

For the exposed residues of p-strands (Figure 8), the 
plots for the folds of "all- (3" proteins class were widely 
distributed, although they are commonly completely 
antiparallel p-sheet except for SS p-helix. Furthermore, 
the folds of "a/p" proteins class have different amino 



acid compositions from that of SS P-helix, although they 
have parallel p-sheets. Figure 7 shows that the plots for 
the folds of "all-p" proteins class were widely distributed 
and the plot of SS p-helix is in the center of the graph. 
The residue fractions (fviL Ur ) of the three folds that have 
completely parallel p-sheets were also widely distributed 
(51.4, 47.2 and 42.7%). 

These results indicate that the correlations found in 
Figure 7 and 8 cannot be explained by the types of p- 
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sheets. Consequently, we think that the propensities do 
not depend on the types of p-sheets. 

Robustness of the dataset 

We checked the robustness of our results using the 
dataset of more than 1,500 residues and less than 2,000 
residues, which is not included in the dataset used 
in this study; six folds for a-helix and eight folds for 
p-strands. For p-strands, strong correlations were 
also observed for buried residues (Rwyqs-vil = -0.81) 
and for exposed residues (Rilekr-stn = -0.78). There 
are no strong correlations for buried residues 
(Rwyqs-vil = -0.64) and for exposed residues (Rilekr- 
stn = -0.48) in a-helices. These results are the same 
as those obtained for the dataset containing more 
than 2,000 residues. Therefore, the results presented 
here seem to be independent of the dataset selection. 

Conclusion 

The amino acid propensities for secondary structures 
were investigated for each SCOP fold. The helix propen- 
sities calculated for exposed and buried residues are also 
similar to each other. For p-sheet propensities, however, 
propensities calculated for exposed residues are remark- 
ably different from those of buried residues, which 
are similar to those calculated for all residues because 
p- sheets tend to be located in the interior of proteins. 

We also detected correlations between amino acid 
compositions in p-strands. At buried sites, the content 
of Tyr, Trp, Gin and Ser correlates negatively with the 
content of the aliphatic amino acids Val, He and Leu. 
All-p proteins tend to have a higher content of Tyr, Trp, 
Gin and Ser, whereas a/p proteins tend to have a higher 
content of aliphatic amino acids at buried sites. In all-p 
proteins, the H-bonds between buried side chains may 
be necessary for correct alignment of two large p sheets. 
For exposed residues, there is a tendency that a fold with 
a high content of He, Leu, Glu, Lys and Arg would have 
a low content of Ser, Thr and Asn. Generally, a/p pro- 
teins have twisted and bent p-strands and favor longer 
side chains at exposed sites. 

These findings are very useful for the design of 
p-sheet. They are especially effective when there is struc- 
tural information such as whether a residue is exposed 
or buried, two large p-sheets are packed together, a 
p-sheet has a-helices at least one side of p-sheets and 
a p-strand is twisted or not. Hecht and coworkers 
have succeeded in designing de novo proteins with 
binary patterning techniques, in which polar and non- 
polar amino acids are placed at desired sites along 
the sequence by synthesizing DNA with degenerated 
codon [37]. If one desire to design a de novo protein 
library of SS p-helix, for example, he should consider 
to bias in favor of Ser, Thr, and Asn rather than Glu, 



Lys, Arg for exposed sites on p-strands because the 
frequency of Ser, Thr, and Asn is relatively high and 
conversely the frequency of He, Leu, Glu, Lys, Arg is 
low for exposed sites on p-strands of SS p-helix folds 
(Figure 8). 
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